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We derive a general ansatz for optimizing pseudo-Cf estimators used to measure CMB anisotropy 
\Q , power spectra, and apply it to the recently-proposed pure pseudo-Cf formalism, to obtain an estima- 

tor which achieves near-optimal B-mode power spectrum errors for any specified noise distribution 
while minimizing leakage from ambiguous modes. Our technique should be relevant for upcoming 
CMB polarization experiments searching for B-mode polarization. We compare our technique both 
to the theoretical limits based on a full Fisher matrix calculation and to the standard pseudo-Cf 
technique. We demonstrate it by applying it to a fiducial survey with realistic inhomogeneous noise, 
complex boundaries, point source masking, and noise level comparable to what is expected for next 
generation experiments (^5.75 /iK-arcmin). For such an experiment our technique could improve 
the constraints on the amplitude of a gravity wave background by over a factor of ten compared to 
what could be obtained using ordinary pseudo-Cf, coming quite close to saturating the theoretical 
limit. Constraints on the amplitude of the lensing B-modes are improved by about a factor of 3. 

on 
in 

O ' !• INTRODUCTION 
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The polarization of the Cosmic Microwave Background (CMB) could provide a unique window into the very early 
Universe if its pattern can decomposed into gradient and curl (or E and B) parts. The detection of large scale i?-mode 
• polarization would indicate the presence of a stochastic background of gravitational waves left over from the epoch 
of Inflation [6, 11]. The amplitude of the background would allow us to directly deduce the energy scale of Inflation, 
I i dramatically extending our understanding of the history of Universe to its very beginnings. B-mode polarization 
on small angular scales is mainly produced by the gravitational lensing of the B-mode component [20] . A detailed 
measure of these contributions could help us constrain various cosmological parameters such as the equation of state 
' of the dark energy or the mass of the neutrinos. 

The clear prospect of studying the birth of our Universe and improving constraints on some cosmological parameters 
using the CMB polarization has led to remarkable progress in detector technology with the promise to deliver the 
k> required sensitivity to detect the small B polarization signal. In the near future experiments targeting .B-mode 
polarization will be performed from the ground or from balloons. These experiments will cover relatively small 
fractions of the sky with the aim of producing high fidelity polarization maps that can be decomposed into E and B. 

In a finite patch of sky, the E-B decomposition framework needs modification. It is no longer true that any polar- 
ization field can be uniquely decomposed into pure E and pure B-modes. A new set of modes, so-called "ambiguous" , 
needs to be introduced [1]. These new modes receive contributions from both E and -B-modes. In practice because 
the E signal is expected to be so much larger than the B one, the ambiguous modes will be dominated by E signal. 
Thus all the information about the cosmological .B-modes is contained in the pure B-modes, which are orthogonal 
to both pure E and ambiguous modes. The leakage of E signal into B can be thought of as an additional source of 
noise on top of that introduced by the detector, but a source of noise that can be eliminated by a suitable choice of 
variables. 

Analyzing the data of the next generation of experiments is potentially quite challenging. Measuring the power 
spectra using a fully optimal likelihood based analysis requires 0(Np ix ) operations and is probably not feasible for 
this next generation of experiments. The alternative technique, widely used for CMB temperature experiments, is the 
pseudo-Q quadratic method [17]. Unfortunately in a finite patch of the sky the pseudo-Cf 's do not isolate the pure B- 
modes before constructing the quadratic estimators. As a result the quadratic estimator is contamined by the leakage 
from the ambiguous modes. This contamination can be removed on average by taking suitable linear combinations of 
the pseudo-Cf 's. However the leaked power still contributes to the variance of the estimators, significantly degrading 
their ability to measure the amplitude of the cosmological B-modes. 

Recently an improved technique, a pure pseudo-Cf technique was introduced [12]. This technique preserves the 
simplicity of the pseudo-Cf 's but at the same time ensures that no E — > B mixing occurs. In [12] it was also shown 
that the pure pseudo-Cf 's preserve most of the cosmological information in the data. 

In this paper we will look at the pure pseudo-Cf from a new perspective. Our different approach will allow us to 
understand various features of the pure pseudo-Cf 's and also give guidance as to how to choose the weight function that 
the technique requires. The main result of the paper, a general ansatz for optimizing the pseudo-Cf weight function 
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given arbitrary signal and noise covariance, is presented in §V. Using this ansatz to generate weight functions in 
an automated way, we study several mock surveys, culminating in a realistic simulation of a fiducial experiment, 
with characteristics based on the upcoming balloon based EBEX [8]. Our simulations have inhomogcneous noise, 
complicated boundaries and point source masking. We will argue that the pseudo-C^ estimators give near-optimal 
power spectrum errors on all angular scales. This resolves a long-standing practical issue in the pseudo-C^ method: 
the lack of an algorithm for choosing weight functions in complex situations. Previous studies have had to rely on 
optimizing weight functions by hand, using expensive Monte Carlo simulations to compute the estimator variance 
for each candidate weight function considered. At the same time, we preserve the strong E-B separation of the 
pure pseudo-CV method, achieving excellent B-mode errors even for the low noise levels anticipated for the upcoming 
generation of CMB polarization experiments [8, 10, 15, 18]. 



II. PRELIMINARIES 
A. Spin two notation 

First we will introduce our notation. We will follow closely [1] . The (linear) polarization of the CMB is described 
in terms of the Stokes parameters Q and U. The definition of Q and U depends on the coordinate system chosen. In 
this subsection we review definitions that are valid for the full sky, so we will use spherical coordinates to define Q 
and U. 

We will follow the notation of [19]. The Stokes parameters can be combined to form a spin-2 combination (Q + ill) 
and a spin-(— 2) combination (Q — ill). In the full sky these combinations can be decomposed using spin-2 harmonics, 

Q + iU = ^2a 2 ,i m 2Y lm ; Q - iU = ^ a_ 2 ,; TO -iY lm (1) 

Ira Ira 

It is natural to introduce a scalar (E) and a pseudoscalar (B) field to describe polarization. The expansion 
coefficients of these two fields in (ordinary spin-0) spherical harmonics are 

aE,lm = -{Cl2,lm + G-2,im)/2 | a>B,lm = i(a2,lm ~ 0-2,Jm)/2. (2) 

These two functions completely characterize any polarization field on the sphere [19]. They are important physically 
because cosmological density perturbations cannot create S-type polarization while gravity waves can [7, 19]. 

The spin-2 harmonics in equation (1) can be related to the usual spin-0 spherical harmonics by means of two 
first-order differential operators, the spin-raising (d ) and spin-lowering (d ) operators [19]. When applied to the 
spin-weighted spherical harmonics, these operators yield the following identities: 

d.Y lm = [{l-s)(l + s + l)}^ 2 s+1 Y lm 

3 s Y lm = -[(* + *)(/ -« + i)]V2 s _ lYlm . (3) 

We can show that: 

X e = [d6(Q + iU)+d6(Q-iU)]/2 

= -£[(/ + 2)!/(Z - 2)\} 1 / 2 a E , lm Y lm 

Im 

Xb = i[8S(Q + iU)-d6(Q-iU)}/2 

= ^[G + 2)!/(/-2)!] 1 / 2 a B , m r ;m . (4) 

Ira 

Thus we can take linear combinations of second derivatives of the Stokes parameters and obtain variables that depend 
only on E or on B. 

We pause to note that the reason why E and B are the focus of attention instead of xe, Xb is partly a matter 
of convention. Perhaps more importantly, E and B have the same power spectrum on small scales as the Stokes 
parameters, while that of x differs by a factor of ~ I 4 . As a result while white noise in Q and U translates into white 
noise in E and B, it becomes colored noise for x- 
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B. Small-angle approximation 

If one works over a small patch of sky, one can use the small-angle (flat-sky) approximation. When working in 
this limit, it is more natural to measure the Stokes parameters with respect to a Cartesian coordinate system (x, y) 
instead of the usual polar coordinate axis. In the flat-sky approximation polarization is expanded in terms of Fourier 
modes, 

(5) 

where x = (x,y) and 1 = I (cos </>, sin </>). The differential operators reduce to simply 

6 = -(d X +idy), (6) 

6 = -(d x -id y ). (7) 

C. The ambiguous modes 

On a manifold without boundary, any polarization field can be uniquely separated into an E part and a B part. 
But if there is a boundary (i.e., if only some subset ft of the sky has been observed), this decomposition is not unique. 
In this section we summarize the results of [I]. 

Polarization fields living on form a normed vector space with the inner product 

/ (QQ' + UU')<m, (8) 
Jn 

and we say that two fields (Q + ill) and (Q' + ill') are orthogonal if their inner product vanishes. We refer to a 
polarization field as 

• E if it has vanishing xb, 

• B if it has vanishing xe, 

• pure E if it is orthogonal to all £>-fields, and 

• pure B if it is orthogonal to all _E-ficlds. 

On the complete sky, every polarization field can be uniquely represented as a linear combination of an E field and 
a B field, and all E fields are perpendicular to all B fields. In other words, the space of all polarization fields is the 
direct sum of two orthogonal subspaces: the space of all E fields and the space of all B fields. In this case, there is 
no distinction between an E field and a "pure E" field. 

But if only some subset of the sky has been observed, so that is a manifold with boundary, then this decomposition 
is not unique. One way to see this is to note that there are modes that satisfy both the E-mode and B-mode conditions 
simultaneously. When we split a polarization field into an E part and a B part, these "ambiguous" modes can go into 
either component. In order to make the E/B decomposition unique, we must first project out the ambiguous modes. 

The existence of ambiguous modes has important implications at the time of measuring the £?-mode power spectrum. 
Here we concentrate on "simple" quadratic methods that make no attempt to filter those ambiguous modes at the 
level of the map (such as [17]). A full likelihood analysis would automatically deal with the ambiguous modes but it 
will probably be computationally prohibitive. It might be feasible for low resolution maps, becoming the method of 
choice for constraining the gravitational wave amplitude [3] . 

It is easiest to understand the issue to focus on the pseudo-CV method, but the point applies generally. In the 
pseudo-Cf method one just measures the power spectra of E and B as one would do in the full sky and simply 
masks out the unobserved regions. After this procedure both Cf and Cf spectra are dominated by £-modes. 
An illustrative example is shown in figure 1. We considered a circular region 10° in radius. The E and B power 
spectra were measured by simply masking the unobserved region but otherwise proceeding as if one had a full sky 
measurement. The calculation was performed in the flat sky approximation which is adequate for this size region. 
The input polarization field had _E-modes only. In the figure we show the measured B mode power spectrum spectra, 
which is produced by leakage of the i?-modes. For comparison we show the expected -B-mode power spectrum if 
T/S = 0.1 together with the B- modes produced by gravitational lensing. Also for reference we plot the detector noise 
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FIG. 1: Illustration of the leakage due to finite sky coverage in the ordinary pseudo-C^ technique. A circular region 10 degrees 
in radius was used to estimate the pseudo-CV 's. The leakage curve shows the B-modes obtained even in the absence of any 
cosmological B-modes just from the leakage of E. For comparison we also show the expected cosmological signal for T/S = 0.1 

— 1/2 

and the noise power spectrum for an experiment with w P = 5.75 /iK-arcmin and an 8 arcminute beam. 

— 1/2 

power spectrum for w p =5.75 /iK-arcmin and an 8 arcminute beam. This level of noise is within the range of 
expected values for the upcoming EBEX balloon experiment [8]. 

The pseudo-Cf method then takes linear combinations of the measured CVs to eliminate the _E-mode contribution 
in the hopes of revealing the cosmological B-modes. The point is that this procedure makes the B-mode power spectra 
estimator independent of E in the mean, but the variance of the estimator is still dominated by the leaked power. 

Perhaps an extreme example is helpful to clarify the problem further. Imagine that cosmological parameters were 
known perfectly so that there is no uncertainty in the B-mode spectra. Then the leaked B power is perfectly predicted 
and can be subtracted from the measured signal to leave an unbiased Cf estimator. This is analogous to what one 
does with detector noise. Unfortunately just as with noise the variance of the estimator is increased by the leaked 
power. The pseudo-CV technique does even worse than this because it does not assume any knowledge of the Cf so 
it uses only linear combinations that are on average independent of E for any choice of Cf. (In reality for a small 
patch this is not possible and some assumptions such as constancy of the E power spectrum in bands must be used. 
These assumptions are expected to be reasonably well satisfied.) 

Figure 1 illustrates the severity of the problem for experiments with noise levels comparable to those of our fiducial 
experiment (5.75 /iK-arcmin). The leaked power is larger than the detector noise all the way to I ~ 400, so using the 
pseudo-Q technique is clearly wasteful. Note that even on scales where the lensing signal dominates there would be 
significant degradation. 

III. AVOIDING LEAKAGE BY MEASURING X b 

When trying to measure the -B-mode power spectrum in a finite patch there is a simple way to avoid unwanted 
leakage from B-modes: estimate the power spectrum starting from the \b map. In the next section we will show 
that this is mathematically equivalent to the pure pseudo-C^ technique. It is important to realize that in practice 
no information is being lost by doing this. By construction \b has no contribution from either pure B-modes or 
ambiguous modes. Although in principle there is some information on the amplitude of B-modes in the ambiguous 
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modes, in practice that signal is completely swamped by the contribution of the _E-modes so there is no real loss of 
information by working with xb- 

Perhaps an analogy helps at this point. One could consider a scalar field, like the CMB temperature, and wonder 
whether there is any loss of information if one were to compute the Laplacian of the map and extract the power 
spectrum from that. In the full sky there would be no loss of information but in a small patch one would lose. In the 
full sky there is a one to one relation between the Laplacian map and the underlying temperature map (except for the 
monopole and dipolc which are lost when one takes the Laplacian, but they do not carry cosmological information); 
the only difference is that both signal and noise power spectra will be multiplied by £ 2 (£ + l) 2 which does not alter 
how well one can constrain each multipolc. In a finite patch the situation is different because there are non-zero 
temperature fields on the finite patch that can have zero Laplacian. Those modes are filtered out by the "Laplacian- 
technique" with the resulting loss of information. In our case the analogue of the modes that are being filtered out 
are the ambiguous modes which we want to remove anyway. Thus for constraining the B-mode amplitude, starting 
with the xb field is sufficient. 

Of course the fact that the xb map contains for practical purposes all the cosmological information does not mean 
that if we extract its power spectrum in a non-optimal way we will still conserve all the information in the data. If 
in practice (as we will do in later sections) we use a pseudo-C^ technique to extract power spectra, it may be that 
allowing some leakage is still advantageous. As long as the leaked power is small compared to the detector noise it 
is most certainly harmless. We will see that it is indeed the case that small amounts of leakage are preferred by our 
technique. We will postpone discussion of this point to later. 

Once we conclude that measuring the power spectrum starting from xb avoids the unwanted leakage there are 
various technical issues one needs to address to implement such a method in practice. Moreover, we will show that 
this technique is mathematically equivalent to the pure pseudo-C^ technique and we will find it easier to implement 
our algorithms using that perspective. We do want to comment on two aspects; first, in practice one needs to take 
the derivatives by finite differencing which will introduce some error and residual leakage. This residual leakage will 
constrain the choice of pixclization. We discuss this in more detail in Appendix A. Second, the xb field has a very 
blue spectrum which will affect standard estimation techniques. We will discuss this second issue here because it will 
illuminate our discussions down the line. 



The additional derivatives needed to construct xb make its power spectrum very blue. We will show in this section 
that this induces high levels of aliasing unless appropriate apodizations are used. Thus apodization will be an integral 
part of any technique that starts with xb- 

To understand the issue it is best to write down the formulas in the flat sky approximation. We assume that the 
Stokes parameters have been measured in a certain region of the sky and that Xb was calculated by finite differencing. 
Then the xb field is multiplied by a window function W (in real space) and the Fourier components of (Wxb) 
are measured and squared to estimate the power spectrum. The weight W is zero outside the observed patch. If 
derivatives are taken using nearest neighbors, one also loses an additional pixel around the edge of the map. Note 
that this loss is directly related to the counting of ambiguous modes as described in [1]. 

The resulting power estimates are the convolution of the xb power spectrum and the Fourier transform of the 
window, 



The xb power spectrum is now so blue that unless the window function decreases fast with /, all estimates of power 
will be dominated by the small scale modes. 

One way of seeing why this is a problem is to consider normalizing the window such that ||M^(0)|| 2 = 1. The 
estimator in equation (9) estimates the power at 1 but has aliasing from other modes. Even if one considered the 
idealized situation in which the higher I power spectra were known so that the aliasing could be calculated and 
subtracted, the aliased power would still contribute to the noise of the estimator. If the aliased power dominates 
the integral in Eq. (9), the additional variance is large compared to the signal one is after with the resulting loss of 
information. Thus one would want to avoid having such aliasing if possible. 

The high I behavior of ||W(1)|| 2 is directly related to the number of normal derivatives of W that vanish at the 
boundary of the patch. In fact 



A. A very blue spectrum: the need for apodization 




(9) 



lim ||T^(1)|| 2 ocr 2(n+2) , 



(10) 
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where n is the number of derivatives of the window that vanish at the boundary with n — meaning that the window 
itself is continuous. Thus requiring that equation (9) converges requires using a window that is smoother at the 
boundary. For example if we compare measuring the power spectra of the Stokes parameters without apodization to 
measuring the power spectrum of xb, compensating the additional I 4 factor in the xb spectrum requires having both 
the window and its first derivative vanish. 

Reducing the aliasing requires considering weights that decrease continuously to zero as one gets to the border; on 
the other hand, if too much apodization is used, then one is effectively using a smaller patch of sky thus decreasing 
the signal to noise. We will use an improved version of this argument to find an optimal choice of W in section V. 
Here we just want to mention that when dealing with xb, because of its very blue nature, we are forced to confront 
the issue of apodization straight on. 

At this point the reader may wonder if using xb is such a good idea as one has to take derivatives of the data, 
deal with extremely blue spectra, etc. In the next section we will show that our technique is equivalent to the "pure" 
pseudo-C^ technique. Thus all the intuition we gain focusing on xb will be directly translated to that other method. 
The pure pseudo-CV formalism also clarifies the issue of statistical weight; we will see that, even though the apodized 
window may suggest that the edges of the survey are downweighted, the statistical weight is in fact roughly uniformly 
distributed for the optimal choice of apodization. 



IV. RELATION TO THE PURE PSEUDO-C/S 

We have now seen that the problem of estimating the B-mode power spectrum (without contamination from E- 
modes) can be reduced to estimating the power spectrum of a scalar field, by passing to the curl xb- In principle, 
any method for power spectrum estimation can be applied to xb, to obtain £>-mode estimators which do not suffer 
from E — > B mixing. We will consider a special case which will be the focus of the rest of the paper: estimating the 
power spectrum of xb using pscudo-C^ estimators. 

Let us recall how pseudo-CV power spectrum estimators are defined for the scalar field xb- First, one computes 
multipolcs of the weighted field (or "pseudo multipoles" ) : 



def 1 



^/(£ -!)£(£ +!)(£ + 2) 



J d 2 x X B(x)W(x)Y e * m (x) (11) 



where W(x) is a heuristically chosen weight function. The issue of choosing W(x) will be studied in detail in subsequent 
sections. In (11), we have included the prefactor l/y/Jl — 1)1(1 + 1){£ + 2) so that the normalization will match the 
.B-mode power spectrum Cf B . 

We briefly describe the construction of unbiased bandpower estimators from the pseudo multipoles (11); for more 
details see [5, 17]. First we sum the multipoles in bins, obtaining pseudo bandpowers 

ieb m=-t 

where the index a runs over I bands. Then we define unbiased estimators by 

C a =K-i,(C a ,-N a ,) (13) 

where N a is the (additive) noise bias of each C a , and K aa / is the transfer matrix between pseudo bandpowers (12) 
and signal bandpowers A a > . (More precisely, N and K are defined by (C a ) = K aa >A a > + N a .) 

The preceding construction defines a power spectrum estimator for polarization which eliminates E — > B mixing 
in the strongest possible sense: the estimated i?-mode power acquires no contribution from _E-modes. In [12], "pure 
pseudo-Cf estimators" for polarization, which also eliminate E — > B mixing in this strong sense, were defined. We 
now prove that the two estimators arc mathematically equivalent. 

Substituting the definition (4) of xb into the definition of the multipolc (11) and integrating by parts, one obtains: 

«L = Tit d 2 x{Q{x)+iU{x))dd l ^ W +c.c. 
lm 2 J ' ^ (£-!)£(£ +!){£ + 2) 

= l -J Sx (Q(x) + iU{x)) [w(x)( 2 Yl m (x)) + — _ |= = = WT(*)(il7 m (*)) 

+ ; 1 W;{x)Y e * m (x)} +C.C. (14) 
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where we have defined the spin-1 and spin-2 quantities 

W 1 (x)=dW(x) W 2 (x) = SdW{x). (15) 

In the form (14), it is seen that a* m agrees with the pure multipole defined in [12], after changing from tensor notation 
to the spin-s notation used here. This shows that the pseudo multipoles in the two constructions are equal; since the 
rest of the pseudo-C^ construction (Eqs. (12), (13)) is also the same in both cases, this completes the proof that the 
Xb estimator and the pure pseudo-C^ estimator are equal. 

Since the estimators are mathematically equivalent, it is a matter of preference which to use, and we will switch 
between the two formalisms in the rest of this paper depending on which is more convenient. Using the xb estimator 
has the advantage that pure i?-mode power spectrum estimation is equivalent to power spectrum estimation of a 
scalar field with a very blue spectrum. As we will see shortly, this will allow us to formulate a unified procedure for 
optimizing the weight function W(x), which applies to both pure and ordinary pseudo-Q estimators. 

In the pure pseudo-CV formalism, the terms have been organized (Eq. (14)) so that the derivatives act only the 
weight function, rather than the spherical harmonic or the noisy polarization map. Provided that the weight function 
varies slowly compared to the pixel scale, this can make it easier to take derivatives numerically; one only has to 
compute Wi(x), W2(x). The first term in Eq. (14) is the usual pseudo-Q estimator for f?-modes; the remaining two 
terms are interpreted as higher-spin counterterms which cancel the E — > B mixing. We will see that this form of the 
estimator is convenient when optimizing the weight function numerically given a noise map (§VII). 

We conclude this section by discussing a general property of the estimator: the weight function must be apodized 
so that both W(x) and its derivative vanish at the boundary. It is illuminating to see how this requirement arises 
from both ways of defining the estimator. 

For the pure pseudo-C^ estimator, it is seen (from the mode in brackets in Eq. (14), which multiplies the Stokes 
parameters in a pixel) that the statistical weight of a pixel is given by a combination of W(x) and its first two 
derivatives. Therefore, if either W(x) or its derivative have nonzero boundary values, the statistical weight will contain 
a delta function on the boundary, which leads to a divergent estimator. (One might try to cure this divergence by 
defining W\, Wi with the delta function terms omitted from the derivatives; the estimator will then be finite, but since 
the relations (15) do not hold strictly, the estimator will mix E — > B.) This reasoning also shows that even though 
W(x) is apodized near the boundary, the statistical weight of pixels near the boundary need not be small, since the 
counterterms W\ and W<i will be largest there. In fact, for the optimized weight functions that we will consider in 
subsequent sections, we have found that the statistical weight is roughly uniformly distributed. 

For the xb estimator, the divergence arises in a different way. Although the estimator appears not to involve 
derivatives of W(x), the field xb has a noise power spectrum on small scales which grows as t A . If either W(x) or 
its derivative is nonzero on the boundary, then ||IU(0I| 2 decays as l/l 4 or slower for large I (Eq. (10)) which causes 
the integral (9) to diverge. From this perspective, the need to apodize is understood as a consequence of small-scale 
noise fluctuations in xb with a very blue power spectrum. 

V. CHOOSING THE OPTIMAL WEIGHT FUNCTION 

In pseudo-Q power spectrum estimation, an important practical issue is choosing the pixel weight function W(x) 
to minimize the variance of the estimator. Although no general procedure has been proposed, several rules of thumb 
for optimizing W{x) under different limiting conditions have appeared. In the noise-dominated limit, inverse noise 
weighting (W(x) = l/a 2 (x)) is optimal; in the signal-dominated limit, uniform weighting is best possible [2]. If the 
power spectrum is being estimated on scales smaller than fluctuations in the noise level, the FKP ansatz [4] has 
become the industry standard for galaxy surveys. 

The preceding rules of thumb implicitly assume that the signal + noise power spectrum is white on small scales. 
However, we have seen that when choosing the weight function for pure B-mode estimation, one includes an extra 
factor I 4 in the power spectrum. This qualitatively affects the optimization and invalidates the rules of thumb. For 
example, W(x) and its first derivative must vanish on survey boundaries, in contrast to the white noise case. 

In this section, we will describe a general ansatz for the optimal weight function, which makes sense for an arbitrary 
A^pix-by- Api x matrix C representing the total covariance (signal plus noise) . In particular, it can be applied to a noise 
power spectrum which grows as any power of £, and so provides a uniform framework for both pure and ordinary 
pseudo-C^ power spectrum estimation. We will see that our ansatz reproduces the above rules of thumb in the 
appropriate limits, and correctly apodizes W(x) if the power spectrum is blue on small scales. 

First we introduce some notation. The A p i x -by-A p i x signal covariance in the bandpower we are estimating will be 
denoted C", where a is an index ranging over £ bands, and we normalize it so that Cfj = 1. 
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If Mi and M 2 are two A p i x -by-A p i x matrices, then [Mi * M 2 ] denotes the A p i x -by-A p i x matrix obtained by elemen- 
twise multiplication: 

[Mi*M 2 ] ij = {M 1 ) ij (M 2 ) ij . (16) 

The multiplication is always performed in a basis where each row or column corresponds to one pixel; otherwise the 
operation defined by (16) would be basis-dependent. 

Now we can state the ansatz for optimizing the pseudo-CV weight function which will be the main result of this 
paper: the optimal weight function W opt (thought of as a length- A p j x vector) is given by 

W opt = [C a * C]- 1 ^ (17) 

where l ve c deontes the length- A p i x vector consisting of all l's. 

It may seem strange that the ansatz (17) includes the * operation (16) which is specific to the pixel-space basis. 
Loosely speaking, in pseudo-C^ power spectrum estimation, we are constrained to use a filter (multiplying by the 
mask) which is diagonal in pixel space, and so the pixel basis is given special significance. Before giving the derivation 
of the ansatz, we note some general properties. Without using the * operation, it seems to be impossible to achieve 
all of the following at once: 

1. If the noise is uncorrelated between pixels, and the noise dominates, then W opt corresponds to inverse noise 
weighting. This follows from Eq. (17) upon noting that C is diagonal in the pixel basis, say Cjj = crfSij, so that 
(C Q * C) = C and (W op t)i = c 4 ~ 2 . Note that in this case, pseudo-Q power spectrum estimation is optimal, in 
the sense that the Cramer-Rao inequality is saturated. 

2. If the signal + noise power spectrum grows as a power law £ A on small scales, and A > 0, then W opt is apodized 
near the boundary. The number of derivatives which vanish at the boundary is given by (A/2 — 1), in accord 
with the discussion at the end of §111 A. As the wavenumber £ where one is estimating the power spectrum 
increases, the apodization length decreases. These statements will be proved in §VB. 

3. Eq. (17) reduces to the FKP prescription in the regime where FKP applies. This is shown in Appendix B. 

The matrix form of the ansatz (17) may suggest that -/V p i x -by--/V p i x matrix operations are needed to compute the 
optimal weight function, which would be problematic since such operations are computationally prohibitive in situa- 
tions where pseudo-C^ estimators are used. (If they were feasible, then one could do a maximum likelihood analysis 
instead.) However, in Appendix C, we will see that a conjugate gradient approach may be used to compute the RHS 
of Eq. (17) even in surveys where the pixel count is large. 

A. Derivation 

Now we give a heuristic derivation of the ansatz (17). The idea is to find the weight function W such that the 
pseudo-Q estimator (defined in Eq. (12)) 

C a [d] = Y i d i WiCf j W j d j (18) 

ij 

is as close as possible to the optimal estimator 

O a [d] ^^d^C^CT- 1 )^,, (19) 

ij 

In Eqs. (18), (19) we have denoted the data vector (a length- A p i x vector whose covariance matrix is C) by d. 
How should "close" be defined? We first note that both estimators can be written as Monte Carlo averages 
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where the notation (■)#<* means that an expectation value is taken over an auxiliary field 3> Q with covariance C Q . 
From the expressions (20), a natural definition of "close" is that the expectation value 

£ ^f^^^ mdi _^^ c -i d .j ^ (21) 

be minimized. Here, the expectation value is taken over both the auxiliary field <& Q with covariance C° and the data 
vector d with covariance C. We minimize E by setting its derivative with respect to the weight function to zero: 
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2 5W fe 

= ^caQfcWi-^CaCr. 1 ^ 

i ij 

= ^2Cf k C lk W t -l (no sum on k). (22) 

i 

i.e., W opt = (C * C a ) _1 l voc . This completes the derivation of (17). 



B. A variational form of the ansatz 



In Eq. (17), we have written the ansatz for the optimal weight function W opt as a matrix equation. There is an 
equivalent formulation as a variational principle which connects to the discussion in §111 A and will also be directly 
useful in later sections: W opt is the weight function which minimizes the expectation value of the pseudo bandpower 

C a 




= pWiWjCjC^ (23) 

ij 

subject to the normalization condition 

^Wi= const., (24) 

i 

This is shown by differentiating Eq. (23) with respect to Wi, obtaining 

P djCijWj = 1 (no sum on i) (25) 

j 

i.e., W opt = (C * C Q ) _1 1 V0C . (Strictly speaking, the RHS in Eq. (25) should be equal to a Lagrange multiplier A, but 
the overall normalization of the weight function is arbitrary.) 

In the previous subsection we showed that W opt is the weight function which makes the pseudo-CV estimator 
approximate the optimal estimator as closely as possible. The variational principle gives another interpretation of 
W opt which matches the discussion from §111 A: W op t is the weight function which minimizes the total aliasing from 
both signal outside the bandpower and noise. Even if known perfectly on average, say because the power spectrum 
on other scales was given, this aliasing will contribute to the noise in the estimator thus degrading the signal to noise. 

The variational principle also gives a simple proof of property #2 at the end of §V, that (A/2 — 1) derivatives of 
W op t vanish at the boundary if the power spectrum is a power law £ x on small scales. From Eq. (10), if n is the 
number of derivatives which vanish, then the contribution to the expectation value (C a ) from small scales (large I) is 
given by 

< 5 "> K /(^ iA " 9B " 4 (26) 

Therefore, n > (A/2 — 1) is a necessary condition for the integral to converge, and so W op t must satisfy this condition, 
since it is chosen to minimize the expectation value. 
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FIG. 2: Optimal window functions W op t(r), given by Eqs. (31), (33) for a signal+noise power spectrum proportional to I 2 (left 
panel) or l 4 (right panel). 

VI. SOME ANALYTIC SOLUTIONS 

To build intuition for the ansatz (17), we give some analytic solutions in the case where the survey region is a 
circle of radius R and the covariance C is given by a power-law power spectrum £ x , where A — 2 or 4. We assume 
uniform noise throughout the survey region, so that the covariance is completely described by a power spectrum, but 
statistical isotropy is broken by the survey boundary. More realistic surveys and power spectra will be considered in 
subsequent sections. We use the flat sky approximation throughout. 



A. Case 1: I power spectrum 



Let us first consider the case where the power spectrum is proportional to £ 2 , so that the operator C is equal to 
(— V 2 ). Computing the operator (C * C a ) involves some subtleties, so we show the details explicitly. In the position 
space basis, the matrix elements of C and C a are given by 



(C) xy = -V 2 6 2 (x-y) 
(C a ) xy = J (£ \x- y\) . 



(27) 



where £q denotes the wavenumber where we are estimating the power spectrum. From the definition (16), the matrix 
elements of (C * C a ) are 



(C * C a )xy 



-W 2 5 2 (x-y)\j (£ Q \x-y\) 
-W 2 S 2 (x-y)+£ 2 S 2 (x-y) 



(28) 
(29) 



where we have used Jo(0) = 1, J'(0) = 0, Jq (0) = —1/2. This calculation shows that the operator (C * C a ) is equal 
to(-V 2 +£ 2 ). 

Combining Eqs. (17), (28), the optimal weight function satisfies the differential equation 



(-V 2 + £ 2 )W opt (x) = const, 
with Dirichlct boundary conditions (as proved in §VB). The solution to (30) is given by 

ur t\ 1 MM 



(30) 



(31) 
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The solutions to (31) are shown in the left panel of Fig. 2. For the A = 2 power spectrum, the weight functions are 
apodized near the boundaries with an apodization length which decreases with increasing l§. This is in contrast to 
the white noise case where the weight function would be uniform, independently of £q, if the noise is homogeneous 
throughout the survey. 



B. Case 2: l 4 power spectrum 

We next consider the case of a signal + noise power spectrum which is proportional to £ 4 . This case corresponds 
to pure B-mode estimation in the noise-dominated limit. 

A calculation similar to (28) shows that the differential equation for the weight function is 

(-V 4 + 4^V 2 -^)VF = const. (32) 

with Dirichlet + Neumann boundary conditions. The solution to (32) is 

w (\ i t+W-rlll ^+ R ^ - t-Ut+r)h (l-R) , . def,„, ^1/2, 

w ^ {r) = 1 - e + i ^R)iai + R)-i-W + R)h(t-R) where l± = (2 ± ^ £ °- (33) 

This solution is shown in the right panel of Fig. 2. Relative to the A = 2 case (left panel), the apodization length is 
larger, and the apodization is such that both W(x) and its derivative vanish on the boundary. 



VII. NUMERICAL SOLUTIONS 



Let us summarize our results so far. We have shown that E — > B mixing in pscudo-Cf power spectrum estimation 
can be eliminated by taking the curl of the map, then estimating the power spectrum of the resulting scalar field. 
Because the curl operation results in a noise power spectrum which grows as Ce oc £ 4 , the weight function must 
be smoothly apodized near boundaries in order to control contamination from small-scale power. This technique is 
mathematically equivalent to the pure pseudo-CV formalism from [12], in which "counterterms" involving spin-1 and 
spin-2 weights are added to the ordinary pseudo-Q estimator to cancel E — > B mixing. We have proposed a general 
ansatz (17) for optimizing pseudo-CV weight functions in the presence of arbitrary signal and noise covariance, and 
constructed analytic solutions in special cases. 

For practical application, one needs a method for solving the ansatz numerically, giving a specification of the noise. 
The most general noise model we will consider is uncorrelated between pixels, isotropic in each pixel, but with an 
arbitrary pixel-dependent amplitude: 

( (Q(x)Q(x>)) (Q(x)U(x')} \ ( a\x) \ 

For this noise model, we have implemented a "black-box" procedure which starts from the noise RMS a(x) in each 
pixel and outputs E-mode and B-mode weight functions in each I band. The procedure uses the variational principle 
from §VB, finding weight functions which minimize the quantity (C a ). In this and the next section, we will describe 
qualitative features of the solutions and study estimator performance, deferring implcmcntational details of the method 
to Appendix C. 

We work in the pure pseudo-Q formalism, with one minor modification. As we have described it in §IV, the spin-1 
and spin-2 weights are always given in terms of the spin-0 piece by: 

Wi(x) = 8 W{x) W 2 {x) =Sd W(x) , (35) 

In our numerical optimization procedure, we do not impose Eqs. (35) as constraints; each S-modc "weight function" 
actually consists of three independent pieces: a spin-0 (scalar) function Wo(x), a spin-1 (vector) field W\{x), and a 
spin-2 (tensor) field W2(x). Note that for EE power spectrum estimation, we do not include higher-spin counterterms, 
and so each i?-mode weight function is simply a scalar W(x). 

In order to differentiate between different flavors of estimators, we will use the following terminology in the rest of 
the paper. If both counterterms are absent (i.e., W\ = W 2 — 0), we will refer to the B-mode estimator as "ordinary 
pseudo-Q " . We will reserve the term "pure pseudo-CV" for the situation where the relations (35) hold exactly, e.g. 
because W(x) is of known analytical form and W±, W 2 are obtained by simply evaluating derivatives in each pixel. 
In this case the i3-mode estimator will have zero E — > B leakage (except perhaps from aliasing artifacts in a finite 
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FIG. 3: B-mode weight functions for varying noise levels, for £ = 20 and a 10° spherical cap with homogeneous noise. As 
descrbed in the text, each weight function consists of spin-0, spin-1, and spin-2 pieces, which we show in the left, center, and 
right columns. The top three rows show the B-mode weight functions produced by our optimization procedure for noise levels 
100, 20, and 5 /iK-arcmin (from top to bottom). In the bottommost row we show the analytic solution given by Eq. (33). 
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FIG. 4: Power spectrum errors for the 10° homogeneous mock survey described in §VII, using pseudo-CV power spectrum 
estimation with counterterms (left/blue) or optimal estimators (right /red). For the B-mode power spectrum, we compute 
power spectrum errors in a fiducial model with T/S = but also show a spectrum with T/S = 0.05 for comparison. 



pixclization) . Finally, in a case where the relations (35) hold approximately, we will refer to the B-mode estimator 
as "pseudo-C^ with counterterms" . This will be the situation for the weight functions produced by our numerical 
optimization procedure; since the relations (35) are not imposed as constraints, they do not hold strictly, and so there 
will be some nonzero mixing in the estimator. However, we will see shortly that the relations will hold approximately, 
and the level of E — > B mixing for optimized weight functions is small in practice. This is because reducing the mixing 
helps minimize the quantity (C a ), so our optimization procedure prefers weight functions which satisfy Eqs. (35) to 
a good approximation. In fact, this is the main reason why we have found it convenient to work in the counterterm 
formalism, rather than passing to the curl \b (which we showed was equivalent in §IV): the variational optimization 
procedure sidesteps implementational issues associated with taking derivatives in an irregular spherical pixelization 
such as Healpix. 

In Fig. 3, we show the result of our optimization procedure for I = 20, a spherical cap shaped survey with radius 10°, 
and three choices of noise level: 100, 20, and 5 /iK-arcmin. For the largest noise level, both higher-spin counterterms 
are small and the weighting is roughly uniform. In this regime, our optimization procedure reduces to ordinary 
pseudo-Cf power spectrum estimation with tophat weighting. For the smallest noise level, the weight function is 
apodized, both counterterms are present, and all components of the weight function have nearly converged to the 
analytic solution (33), which represents the limit EE 3> BB. Here our procedure reduces to pure pseudo-CV with 
weighting that can be understood by solving the differential equation (32). Our optimization procedure therefore 
smoothly interpolates between ordinary and pure pseudo-C^ as the noise level is improved, "turning on" the higher- 
spin counterterms as they are needed to reduce E — > B mixing below the noise floor. (We will illustrate this in a 
different way in the next section.) 

Next we consider power spectrum errors from our estimators for a spherical cap shaped mock survey with radius 
10°, homogeneous noise level 5.75 /iK-arcmin and Gaussian beam #fwhm = 8 arcmin. These values have been chosen 
to roughly model the fiducial realistic survey which will be treated in more detail in the next section. 
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12.04 fiK-arcmin 6.02 5.25 4.72 4.32 4.01 3.76 3.55 



FIG. 5: Survey region, noise distribution, and point source mask for our fiducial realistic experiment. The side length of the 
bounding square is 25° . The map is based on preliminary simulations of the upcoming balloon experiment EBEX. 

We have used an azimuthally symmetric mock survey so that optimal, or maximum likelihood, power spectrum 
estimation will be computationally feasible, even though the pixel count is large enough that this would normally be 
impossible. The computational speedup is obtained because both signal and noise covariance matrices are diagonal 
in m; for details see Appendix F of [12]. This allows us to compare our estimators to optimal in a baseline survey 
which approximates our fiducial realistic experiment (§VIII) within the constraint of azimuthal symmetry. 

In Fig. 4, we show power spectrum errors for the mock survey using both pscudo-CV with weight functions produced 
by our optimization procedure, and optimal power spectrum estimators. It is seen that our method is slightly 
suboptimal at very low £ but rapidly becomes optimal. For example, the S-mode bandpower RMS is 72% optimal 
in the lowest band, 88% optimal in the second-lowest, and 91% optimal in the third lowest. We emphasize that this 
level of performance for i3-modes is much better than could be obtained at this noise level using ordinary pseudo-C^ 
estimators [2] , and that we have obtained it using a completely automated procedure for generating weight functions 
from the noise distribution. 



VIII. A REALISTIC EXAMPLE 

In the preceding section we considered a homogeneous, azimuthally symmetric mock survey whose noise distribution 
was much simpler than a real CMB experiment. We conclude this paper by considering a more realistic example. 
We will use noise maps made from preliminary simulations of the EBEX experiment [22]. The noise map is shown in 
Fig. 5. The simulation used the scan strategy proposed by EBEX and a realistic focal plane configuration to estimate 
the number of hits for every 1 arcminute square pixel after a 14 day long duration balloon flight. The details of the 
scan, focal plane configuration, number of detectors and ultimate sensitivity are not of particular interest here and 
are probably subject to change before the flight takes place. However the simulations give a good illustration of the 
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FIG. 6: Weight functions for the fiducial realistic experiment. Each of the three rows corresponds to a different bandpower, 
with multipole ranges (^mim Imm) given from top to bottom by (30,70), (190,230) and (510,550). Within each row from left 
to right, the _E-mode weight function, the spin-0 piece of the B-mode weight function, and the spin-1 and spin-2 pieces of the 
B-mode weight function are shown. 

level of complexity of the noise maps and sky coverage for upcoming ground and balloon based experiments. Thus 
they serve as a nice test bed for our methods. 

Since the map is not azimuthally symmetric, computing optimal power spectrum errors is computationally pro- 
hibitive and we will not be able to compare our method to optimal. Instead we will compare pseudo-C^ power 
spectrum errors for the fiducial realistic experiment to those obtained for the azimuthally symmetric mock survey in 
the preceding section, which roughly approximates the sky coverage and noise distribution. 

In Fig. 5, we have generated randomly-positioned point sources with average number density 0.04 deg -2 , and 
masked each point source to radius 17 arcmin, corresponding to the 5a level of the beam (assumed Gaussian with 
$fwhm = 8 arcmin). With these parameters, 1% of the survey area is excluded by the point source mask. The area 
that will need to be cut out to sufficiently mask point sources to measure f?-mode polarization is at this point rather 
uncertain. We have based our choice of 1% on point source simulations conducted at SISSA for the Planck satellite 
reference sky [21]. Our choice should thus be seen as an illustrative example to help us understand how our method 
would deal with the resulting holes. We will analyze the map with and without the mask, to isolate the impact of 
point sources when measuring £?-modes. 

For analyzing a very inhomogeneous map such as this, with noise fluctuations on large and small scales, an automated 
procedure for optimizing weight functions is a practical necessity. Using the optimization procedure described in §VII, 
we generated weight functions for both E-mode and _B-mode power spectrum estimation; some representative weight 
functions are shown in Fig. 6. 

Considering _B-mode weight functions first, our method generates weight functions which are more inhomogeneous 
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FIG. 7: A zoomed view of the (€ m in,^max) = (190,230) B-mode weight function near point sources. From left to right, the 
panels show the noise map, the scalar piece of the weight function, and the two higher-spin counterterms. 




FIG. 8: Contribution of aliased B-modes to the pseudo power spectrum Cf B for the fiducial realistic experiment with point 
source mask, with signal and noise power spectra shown for comparison. Using counterterms from our numerical optimization 
procedure, the level of E — > B mixing is well below the noise floor, in contrast to ordinary pseudo-CV . 



than those given by the FKP ansatz (Appendix B) but similar in performance. There is a slight improvement at low 
£; e.g. for the lowest E-mode bandpower, the the bandpower RMS given by our estimator is ~ 10% better than FKP. 
Note that in §VII, we did not discuss B-mode weight functions, since they are very simple for the homogeneous mock 
survey. 

Turning now to B-mode weight functions, one qualitative feature is that their statistical weight is concentrated 
near the center of the survey (compared to the B-mode weight functions). This is for signal-to-noise reasons, since 
the -B-mode signal is weak and the center of the survey is least noisy. 

Another feature of the -B-mode weight functions is that the higher-spin counterterms are relatively smooth across 
the map, even though the scalar piece of the weight function has structure on small scales matching similar structure 
in the noise. Instead, our numerical optimization procedure prefers to associate counterterms with "large scale" 
features such as survey boundaries or point sources, but not with small-scale features in the noise. From Fig. 6, one 
sees that the counterterms are sourced mainly by boundaries at low t and by point sources at intermediate to high £. 
A zoomed-in view of the B-mode weight function near point sources is shown in Fig. 7. The behavior of the weight 
function near the "internal" boundaries of point sources is similar to the behavior near external survey boundaries 
seen in Fig. 3: the scalar piece of the weight function is apodized, and the counterterms are large near the boundary. 

One consequence of the counterterms being smooth, even though the scalar piece of the -B-mode weight functions 
have small-scale structure, is that the relations (35) are not strictly satisfied. Therefore, the estimator is not completely 
pure; there is some nonzero level of E — ► B mixing. This is quantified in Fig. 8, where we show the mean contribution 
of aliased E- modes to the pseudo power spectrum Ci, In the pseudo-CV construction, this contribution is always 
removed in the mean by the debiasing step, but acts as a source of extra variance. It is seen that our optimization 
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FIG. 9: Comparison of three sets of pseudo-Cf power spectrum errors for the fiducial realistic experiment. Weight functions 
were optimized independently in each of the three cases. Left/blue: Pseudo-CV with counterterms, without the point source 
mask. Center/magenta: Pseudo-CV with counterterms, with the mask. Right/red: Ordinary pseudo-CV, with the mask. For 
i?-mode power spectrum errors, the three cases are so similar that only one set of error bars is shown. 



procedure produces counterterms which reduce E —> B mixing well below the level of the noise, even though strictly 
speaking, the mixing is nonzero, i.e. the estimator is not pure. (In Fig. 8, some "stair-step" features can be seen 
which arise because we do not generate a separate set of weight functions in every band, e.g. we use the same weight 
function between I = 510 and I = 830 to save CPU time.) 

Next we study the power spectrum errors which are obtained using these weight functions. In Fig. 9, three sets of 
power spectrum errors are compared. Considering the leftmost set of error bars first (i.e. pseudo-C^ with counterterms 
and without the point source mask), we would like to know, how do these errors compare to optimal? As previously 
remarked, we are unable to make this comparison directly since the fiducial experiment lacks azimuthal symmetry. 
However, the pseudo-CV errors for the fiducial realistic experiment can be compared to the pseudo-CV errors which 
were obtained previously (Fig. 4) for an azimuthally symmetric mock survey with roughly the same sky coverage and 
noise level. We find that the two are roughly equal; the RMS bandpower errors for the fiducial realistic experiment 
for both EE and BB are ~ 10% worse than the mock survey at low t and ~ 10% better at high I. It seems plausible 
that this difference is simply due to inhomogeneity in the noise, and that our method remains near-optimal for the 
fiducial realistic experiment, although we cannot say this with absolute certainty since direct computation of the 
optimal errors is not feasible. 

Now let us compare the errors with and without the point source mask, for pseudo-CV estimators with counterterms 
(i.e., the left and middle error bars in Fig. 9). We find that the point source mask degrades the errors on BB by 
~ 20% for £ < 250 but the effect becomes small for large i, or for EE on all angular scales. Even though a tiny 
fraction of sky area is masked, we find that the impact on i?-mode errors is not negligible. This is to be expected 
since the point source mask does decrease the number of pure B-modes in the survey region. However, even taking a 
conservative estimate for the number density of point sources, we have found that this effect is not large. 

Finally, we compare pseudo-CV with counterterms to ordinary pseudo-CV (i.e., the middle and right error bars in 
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c(^4lens) 


Mode-counting estimate (Eq. (36)) 


0.0032 


0.095 


Homogeneous circular mock survey, optimal estimators (Fig. 4) 
Homogeneous circular mock survey, pseudo-C^ with counterterms (Fig. 4) 
"Realistic" noise map, no point source mask, pseudo-CV with counterterms (Fig. 9) 
"Realistic" noise map, point source mask applied, pseudo-CV with counterterms (Fig. 9) 


0.0039 
0.0043 
0.0045 
0.0056 


0.095 
0.096 
0.082 
0.085 


Ordinary pseudo-CV 


0.0536 


0.258 



TABLE I: Forecasts for the la upper limit on (T/S) and the fractional error on the amplitude A\ Bns of the lensing B-mode, for 
various levels of approximation to the fiducial realistic experiment. 

Fig. 9). It is seen that at this noise level, including counterterms in the estimator is essential for E-B separation. If 
ordinary pseudo-CV estimators are used instead, the RMS -B-mode errors are worse by a factor ~2-3 at intermediate 
to high t, and a factor ~ 10 at low t. If this comparison is repeated without the point source mask, we find that the 
two become equal at high I but remain drastically different at low I. This suggests that the suboptimality of ordinary 
pseudo-Q is mainly due to E — > B mixing from survey boundaries at low I and point sources at high I. 

IX. DISCUSSION 

We have given a general treatment of E — * B mixing and argued that when estimating B-mode power, no in- 
formation is lost by passing to the curl \b- The analogue for a scalar field, passing to the Laplacian, would lose 
the largest-scale modes on a cut sky, but for a spin-2 field it is the ambiguous modes which are lost in this way. 
This presents a general approach to constructing estimators which are pure, i.e. which filter out contributions from 
ambiguous modes. We considered the estimator defined by applying pseudo-C^ estimation to the scalar field xb, and 
proved that it is mathematically equivalent to the pure pseudo-C^ estimator from [12]. The equivalence sheds light 
on properties of the estimator, e.g. the requirement of a smoothly apodized weight function can be understood as 
arising from colored noise in \b- Perhaps most importantly, it allows us to consider ordinary and pure pseudo-C^ 
estimators in a unified way when studying the problem of optimizing the weight function. 

The central result of this paper is a general ansatz for optimizing the pseudo-CV weight function. The ansatz can 
be expressed either in matrix form (Eq. (17)), or as a variational principle: the optimized weight function minimizes 
the total expectation value (C a ) of the estimator, subject to the constraint ^2 X W(x) = 1. Although our emphasis 
has been on next-generation ground-based polarization experiments, and we have only considered noise which is 
uncorrelated between pixels (Eq. (34)), the ansatz is much more general: in principle it applies to arbitrary signal 
and noise covariance. In particular, it should be useful for the temperature power spectrum, but we have not pursued 
this here. 

We have shown our the ansatz can be solved numerically for an arbitrary spatial noise distribution, and studied 
the performance of our estimators for several mock surveys, culminating in a full simulation of a realistic experiment 
with characteristics based on the upcoming balloon borne EBEX: complicated boundaries, inhomogeneous noise and a 
randomly generated point source mask (Fig. 5). This is summarized in Tab. I, where we have compressed the £>-modc 
power spectrum errors obtained for each survey into two numbers: the la upper limit on [T/ S) and the fractional 
error a{A\ cns ) on the overall amplitude of the lensing B-mode. In computing er(j4i ons ), we have used the WMAP3 
best-fit model [14] with as = 0.74; in a different fiducial model it would scale roughly as a(A\ cns ) oc a^ 1 . (We have 
also treated the power spectrum covariance as Gaussian; including non-Gaussianity in the lensing i3-mode is expected 
to give a ~ 10% correction to a(Ai ens ) at noise levels of our fiducial experiment [13] but the effect can become large 
for lower noise.) 

As a baseline, we have shown in the first row of the table the naive "mode-counting" estimate, given by assigning 
uncorrelated errors to each bandpower as follows: 

i / fiBB \ 2 

Cov(A 6 , A'J = -/ sky jyii + 1) [ c bJ +n J ( A b) 2 S w (36) 

where is the noise power spectrum. In the last row, we have shown for comparison the result of using ordinary 
pseudo-Q estimators without counterterms. 

Each row of the table isolates one effect which might be worrying for E-B separation: mixing from the survey 
boundary, suboptimality of pseudo-C^, inhomogeneous noise with small-scale features, and the point source mask. 



19 



Our conclusion is that each of these has a small impact using our method, although it is worth noting that when 
combined, these effects do result in a value of (T / S)\ a which differs from the baseline mode-counting estimate by 
75%. Considering all of these in combination, we have given a complete treatment of "geometric" effects which arise 
from the mask and inhomogeneous noise coupling E and B in a realistic survey. 

It should be mentioned that there are other, non-geometric effects not considered in this paper which will have an 
impact in real CMB polarization experiments, such as (1//) noise, removal of ground-synchronous modes, systematic 
errors, and foreground contamination. In particular, the power spectrum errors we have presented in §VIII should not 
be regarded as a "bottom line" forecast for EBEX, as these effects have not been included. However, geometric mixing 
of E and B has been a practical concern for next generation experiments; we have given a general solution to the 
problem and demonstrated the method for noise distributions with the complexity of a real experiment. Moreover, we 
have shown how to generate optimized weight functions in a completely automated way, thus removing an outstanding 
practical obstacle in the pseudo-C^ method. 
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APPENDIX A: TAKING DERIVATIVES BY FINITE DIFFERENCING 

In §IV, we proved that the pure pseudo-CV method is equivalent to computing the curl \b and then estimating 
the power spectrum of the resulting scalar field. Throughout this paper, we have mainly used the former approach, 
but have also found that the latter can also be used as a practical technique when analyzing noisy maps. Here, we 
describe some implementational details of the method; we will see that this also sheds light on the pixel resolution 
which is needed to control E — > B aliasing. 

Making a xb map out of polarization data involves taking finite differences. The derivative is a local operation so 
to understand the requirements it is easier to work in the flat sky approximation. For this analysis to apply, the flat 
sky approximation should be valid over the scale of a pixel, which is almost always true. 
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In this case we have: 



X e = (d 2 x -d 2 v )Q + 2d 2 U 



XB = 01 - 8 2 y )U - 2d 2 xy Q. (Al) 

In the discrete case we take the partial derivatives by doing linear combinations of neighboring pixels. To isolate B 
around a point x we consider a combination, 



xs = J2 w P- p p> ( A2 ) 



where the sum over p is a sum over neighboring pixels and P = (Q, U). The strategy to determine the weights is to 
consider all Fourier modes, one at a time, and try to reduce the E leakage they produce. We demand that for the 
mode of wavevector 1: 

XB = -k 2 e a -*[B(\)(l + a B (l)(kAr)P • • •) + £7(l)(a B (l)(A:Ar)« + • • ■)], (A3) 

where Ar measures the separation to the closest pixel and cie,b are order one coefficients which depend on the 
orientation of 1 relative to the grid. The aim is to make p and q as large as possible. The weights of course are 
independent of 1 but will depend on the shape of the grid around the point being considered. The grid will also 
determine the coefficients cle.b- 

For a square grid one can obtain the following results: using the 8 neighboring pixels (the three by three square 
with the point of interest at the center) one can get (p,q) = (2,2), using 16 one can get (p, q) = (4,4) and with 24 
neighbors (a 5 by 5 square centered on the pixel) one gets (p, q) — (2, 6). Due to the symmetries of the grid one can 
parametrize the weights in the above cases by up to six coefficients, w\, ■ ■ ■ , w§. For 8 neighbors only w\ and w 2 are 
needed, for 16 w\ through wa, and all of them are used for the 24 neighbor case. The required linear combinations 
are: 

XB,(i,j) = W\{Ui-\ t j + Uij-i - u i+1J - u hj + 1 ) 

+ W2(qi+i,j+i + qi-ij-i — qi-i.j+i — (ft+ij-i) 

+ W3{Ui- 2 ,j + Uij-2 - U i+2 ,j - U hj+2 ) 

+ W i (q i+2 J+2 + <li-2,j-2 — <?i-2,j+2 - ?i+2,j-2) 

+ W 5 (u i+ 2,j-l + Ui+l,j_2 + Uj_l,j+2 + Ui-2,j+l 
^ u i+2,j + l — Uj+lJ+2 — Ui-l,j-2 — Ui-2,j-l) 

+ w 6(<li+2,j+l + Qi+2,j-l + Qi-2,j+l + Qi-2,j-l 

^ Qi+l,j+2 — Qi-l,j-2 — Qi+l,j-2 — Qi-l,j+2) (A4) 

with Wi shown in table II. The coefficients for xe can be obtained from the above by simply rotating the polarization 
by 45 degrees. 

For the lowest order the weights are exactly what one expects for those second derivatives. As one increases the 
number of neighbors the order up to which one can suppress the leakage increases. To hrst and second orders the 
answers we present are unique. For the third order case the requirement to have (p, q) — (2, 6) does not determine 
the weights but leaves one degree of freedom. Than freedom is not enough to increase either p or q so we made the 
choice that minimized (a 2 B ). 

Figure 10 shows the leakage power spectra for a square grid with 1 arcminute pixel separation. What is plotted is 
the xb power spectrum times (I — 2)1/ (I + 2)! in the absence of cosmological i?-modes. It is clear that in this case 
using just the nearest neighbors to take the derivatives is sufficient, except perhaps at relatively large I for the lensing 
signal. In that case an extra order might be required. Note for example that for an experiment like EBEX the leakage 
power is well below the noise for all relevant scales. It is also apparent that if one is after the gravity wave signal the 
requirements are significantly relaxed. In fact for T/S = 0.01 and lowest order derivative, the leakage power spectrum 
is a factor of ten below the gravity wave signal al I = 100 even for pixel separations of 10 arcminutes. 



APPENDIX B: RELATION TO FKP 



In this appendix we prove the assertion, stated without proof at the end of §V, that our ansatz for the optimal 
weight function reduces to the FKP [4] approximation under appropriate conditions. The FKP approximation is used 
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FIG. 10: Expected B-mode power spectrum including the lensing signal and GW with T/S = 0.1 compared to the level of 
leakage for three different numbers of neighbors. For the leakage we plot power spectrum and \b in the absence of any B-mode 
signal times (I — 2)!/(Z + 2)!. If Ar gives the separation between neighboring pixels in the grid, the successive approximations 
to the derivatives scale as Ar 2 , Ar 4 and Ar 6 . 



TABLE II: Weights used in equation (A4) to calculate \b for three different choices for the number of neighbors used. The 
last four columns quantify the level of residual contamination using the quantities in equation (A3). 
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to incorporate the effect of inhomogeous noise when estimating the amplitude of modes that are much shorter than 
the size of the survey and also of the typical variation scale of the noise. FKP propose a weight 

W - KP -W^FW,- (B1) 

where Iq is the wavenumber where the power spectrum is being estimated, fl is the solid angle of each pixel, and of the 
noise variance in pixel i. Working in the flat sky approximation, we will now show that under the FKP assumptions, 
Wf KP satisfies Eq. (25). 

Assuming that we are estimating power in a narrow band a near Iq we have, 

Cij = J J^Qe-^-^. (B2) 

C a = f ^j2. e -ilo-(xi-x 3 -) 



2tt 
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where 1 = Zo(cosv?i ,sin<£i ). Plugging this into Eq. (25) we get, 

E - E + / ^o.-*-) (/ ^.-**->) jjt^: < B3 > 

The first term, coming from the noise contribution to the covariance, simply gives: 

(B4) 



To make progress on the second term we need to use some of the FKP assumptions. First we will use that we are 
interested in a situation in which the noise varies slowly and write, 

2 T Lir = = /(**) + WteXxj - x,) + • • • , (B5) 

so that the second term in (B3) becomes 

£ / ^^ C '°^ H "-*KfM + Vf<*)to - + ( B6 ) 
For the first of these terms we get: 

- w^cr, +ow '° R> <B7) 

where R is the size scale of the patch. Now let us check what happens with the additional terms in Eq. (B6) describing 
anisotropy of the noise. The additional contribution is proportional to: 

V x /-V, C ;o , (B8) 

which is suppressed by 1/IqRn with respect to the leading term where Rn gives the scale of variation of the noise. 
Putting all of this together, we see that 

]T C ii C? j Wf KP = 1 + O(l/l R) + O(1/1 Rn) (B9) 

3 

i.e., for diagonal inhomogenous noise when considering modes much shorter than both the scale of variation of the 
noise and the patch, our ansatz (17) for the pseudo-Q weight function reduces to FKP. 

APPENDIX C: NUMERICAL OPTIMIZATION OF WEIGHT FUNCTIONS 

In this appendix we supply the details of our procedure, described briefly in §VII, for numerically optimizing pseudo- 
Ci weight functions given the noise RMS a(x) in each pixel. We first treat the more difficult case of optimizing the 
-B-mode weight function; as we will then see, optimizing the £?-mode weight function can be obtained as a special 
case. As described in §VII, the .B-mode "weight function" really consists of three functions W(x), W\{x), ^(i) with 
spins 0,1,2 respectively. These are varied independently in order to minimize the total expectation value (C a ) of the 
pseudo-Q estimator (with contributions from signal and noise), subject to the normalization constraint 



J2W(x) = l. (CI) 

X 

To solve this minimization problem, we first rewrite the expectation value (C a ) to be minimized in a form which 
makes the dependence on the weight functions easier to understand. Considering first the noise contribution to (C a ), 
a short calculation shows that 

- i e w, ^ (ww + « l J^e 5J + (C2) 
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where We denotes the £ weighting within the bandpower under consideration. (Throughout this paper we have taken 
this to be We = £(£ + l)/(2n) inside the band and zero outside the band, as in Eq. (12).) 

The signal contribution to (C a ) is more difficult to compute, but follows from the results of Appendix D in [12]. In 
order to write it down, we decompose the pieces of the weight function in spherical harmonics: 



l lm 



iY? m {x)-- 1 Y£ n {x) 



Wi(a 



1 Y e * m (x) + -iF/ m 0r) 



^£m( X ) ~ -2^m( X ) 



(C3) 
(C4) 
(C5) 



(The spin-2 piece (C5) is the E/B decomposition from Eq. (2), whereas the spin-1 piece (C4) is the gradient /curl 
decomposition of a vector field.) 

The signal contribution to (C a ) can then be written in the following form: 



r<ww c*wg r'WE \ 

e ^ 'e £ \ 
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n W 

u £m 
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tm 
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( C* B*\ 
\ a lm a £m J 



C? c Cf* \ ( oF lm 
C? B C? B I 



(C6) 



Here, the power spectra which appear depend on the bandpower weighting We and signal power spectra. For the 
latter, it is convenient to introduce the notation = (C EE,S1S ± C EB ' S1S ). The power spectra in Eq. (C6) are then 
given by: 
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(C7) 
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where dg S /(z) is the reduced Wigner rotation matrix element [16]; note that rfoo( z ) i s J us t the Legendre polynomial 
Pe(z). We have written the power spectra as position-space integrals following [12]; if convenient these expressions 
can be converted to harmonic space using the identity 



/: 



dzdi± s ,(z)dl ±2 (z)d e 2 _s,±2 T s>(z) = 2 



£ £' £" \ I £ £' £" 
s -2 2-s j \ s' t2 ±2ts' 



(C8) 
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Armed with the expressions (C2) and (C6) for the noise and signal contributions to (C a ), a procedure for minimizing 
(Ca) = (C , a)si g + (C a )noiso subject to the constraint (CI) is not difficult to come by. Representing the three components 
of the 5-mode weight function by a single length- (5-/V pix ) vector w, both (C a )sig and (C a ) no i sc are quadratic in w, so 
we can write them compactly as 



(C a ) = (C a ) sig + (C Q ) noiso = w T Q s w + w T Q 1 



w 



(C9) 



where Q s and Q n are (57V p i x )-by-(5A r p i x ) matrices. In this notation, the constraint (CI) can be written v T w = 1, 
where v is a vector with -/V p ; x entries equal to 1 (corresponding to the spin-0 piece of the weight function) and (47V p i x ) 
entries equal to (corresponding to the higher-spin pieces). The solution to this minimization problem is simply 



w, 



opt 



(Qs + Qn^V 

v T (Q s + Q„)- 1 v 



(CIO) 



Since dense inversion of a (5-/V p j x )-by-(5./Vpi x ) matrix is computationally infeasible, our approach is to compute (Q s + 
Qn) _1 w using conjugate gradient inversion [9]. 

In order to use conjugate gradient inversion, two ingredients arc needed. First, one needs an efficient way to multiply 
a vector by (Q s + Q n ), even though this matrix will be too large to store in dense form. This is easily obtained from 
Eqs. (C6) and (C2); note that multiplying by Q s requires transforming from pixel to harmonic space and back. The 
second ingredient is a prcconditioner, or approximate inverse of (Q s + Q n ), used to speed convergence; the better the 
preconditioner approximates (Q s + Q n ) , the more rapidly the conjugate gradient search will converge. 

We use the following simple preconditioner, obtained by keeping only the diagonal of (Q s + Q n ), then inverting: 



(W (x) 
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(Cll) 



(C12) 



47T 

2i+ 1 

47T 



(Cf G + Cf c ) 

(cr+cf B ). 



The performance of this preconditioner depends mainly on the noise level. For noise levels > 20 /zK-arcmin, only a few 
CG iterations are needed; for the noise levels of our fiducial experiment (~ 5.75 ^K-arcmin), around 100 iterations are 
needed; and for noise levels < 1 ^K-arcmin, the CG search does not converge. A better preconditioner may accelerate 
weight function optimization, but we have not attempted to find one in this paper. 

In summary, we have now arrived at a method for numerically optimizing the £>-mode weight function, given the 
bandpowcr (specified by the t weighting We) and noise RMS a(x). One precomputes the power spectra in Eqs. (C7), 
then computes (Q s + Q n )~ 1 v by conjugate gradient inversion, using the preconditioner (Cll), to obtain the optimized 
weight function given by (CIO). 

Finally, let us discuss the completely analagous but simpler problem of optimizing the iS-mode weight function. In 
this case, the weight function is just a scalar W(x). The noise and signal contributions to (C a ) are given (in analogy 
to Eqs. (C2), (C6) above) by 



(C a ) noise = i-^Wi a\x)W{xf 



\ya)s\g — a im U £ a lm 



where 



cr w d = f f dz y: ( 2J -np) ^dgo(*)[cj4,(^(*)+c?4;_ a (z)4;i a (*) 

t' i" 



(C13) 
(C14) 

(C15) 
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Interpreting these as an iVpj x -by-iVpi x matrix equation 

(C a ) = (C«) noisc + (C Q ) sig = w T Q n w + w T Q s w (C16) 
the optimized -E-mode weight function is given by 

Wopt - vT(Q. + Q n )-iv ■ (C17) 

where v is a length-7V p i x vector consisting of all l's. Eq. (C17) is evaluated using conjugate gradient inversion with 
preconditioner given by W(x) — > W{x)/(<Jq + a(x) 2 ). 



